NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning

Guo, Zijian; Zhou, Weichao; Wang, Shengao; Li, Wenchao (March 2025, The Thirteenth International Conference on Learning Representations)

Offline safe reinforcement learning (OSRL) aims to learn policies with high rewards while satisfying safety constraints solely from data collected offline. However, the learned policies often struggle to handle states and actions that are not present or out-of-distribution (OOD) from the offline dataset, which can result in violation of the safety constraints or overly conservative behaviors during their online deployment. Moreover, many existing methods are unable to learn policies that can adapt to varying constraint thresholds. To address these challenges, we propose constraint-conditioned actor-critic (CCAC), a novel OSRL method that models the relationship between state-action distributions and safety constraints, and leverages this relationship to regularize critics and policy learning. CCAC learns policies that can effectively handle OOD data and adapt to varying constraint thresholds. Empirical evaluations on the benchmarks show that CCAC significantly outperforms existing methods for learning adaptive, safe, and high-reward policies.
more » « less
Full Text Available
Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment

Zhou, Weichao; Li, Wenchao (December 2024, 2024 Conference on Neural Information Processing Systems)

Many imitation learning (IL) algorithms use inverse reinforcement learning (IRL) to infer a reward function that aligns with the demonstration. However, the inferred reward functions often fail to capture the underlying task objectives. In this paper, we propose a novel framework for IRL-based IL that prioritizes task alignment over conventional data alignment. Our framework is a semi-supervised approach that leverages expert demonstrations as weak supervision to derive a set of candidate reward functions that align with the task rather than only with the data. It then adopts an adversarial mechanism to train a policy with this set of reward functions to gain a collective validation of the policy's ability to accomplish the task. We provide theoretical insights into this framework's ability to mitigate task-reward misalignment and present a practical implementation. Our experimental results show that our framework outperforms conventional IL baselines in complex and transfer learning scenarios.
more » « less
Full Text Available
HyQE: Ranking Contexts with Hypothetical Query Embeddings

Zhou, Weichao; Zhang, Jiaxin; Hasson, Hilaf; Singh, Anu; Li, Wenchao (November 2024, The 2024 Conference on Empirical Methods in Natural Language Processing)

Retrieval-augmented generation (RAG) systems can effectively address user queries by leveraging indexed document corpora to retrieve the relevant contexts. Ranking techniques have been adopted in RAG systems to sort the retrieved contexts by their relevance to the query so that users can select the most useful contexts for their downstream tasks. While many existing ranking methods rely on the similarity between the embedding vectors of the context and query to measure relevance, it is important to note that similarity does not equate to relevance in all scenarios. Some ranking methods use large language models (LLMs) to rank the contexts by putting the query and the candidate contexts in the prompt and asking LLM about their relevance. The scalability of those methods is contingent on the number of candidate contexts and the context window of those LLMs. Also, those methods require fine-tuning the LLMs, which can be computationally expensive and require domain-related data. In this work, we propose a scalable ranking framework that does not involve LLM training. Our framework uses an off-the-shelf LLM to hypothesize the user's query based on the retrieved contexts and ranks the contexts based on the similarity between the hypothesized queries and the user query. Our framework is efficient at inference time and is compatible with many other context retrieval and ranking techniques. Experimental results show that our method improves the ranking performance of retrieval systems in multiple benchmarks.
more » « less
Full Text Available
Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning

Guo, Zijian; Zhou, Weichao; Li, Wenchao (July 2024, 2024 International Conference on Machine Learning)

Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a fixed dataset. Current state-of-the-art approaches are based on supervised learning with a conditioned policy. However, these approaches fall short in real-world applications that involve complex tasks with rich temporal and logical structures. In this paper, we propose temporal logic Specification-conditioned Decision Transformer (SDT), a novel framework that harnesses the expressive power of signal temporal logic (STL) to specify complex temporal rules that an agent should follow and the sequential modeling capability of Decision Transformer (DT). Empirical evaluations on the DSRL benchmarks demonstrate the better capacity of SDT in learning safe and high-reward policies compared with existing approaches. In addition, SDT shows good alignment with respect to different desired degrees of satisfaction of the STL specification that it is conditioned on.
more » « less
Full Text Available
REGLO: Provable Neural Network Repair for Global Robustness Properties

https://doi.org/10.1609/aaai.v38i11.29094

Fu, Feisi; Wang, Zhilu; Zhou, Weichao; Wang, Yixuan; Fan, Jiameng; Huang, Chao; Zhu, Qi; Chen, Xin; Li, Wenchao (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

We present REGLO, a novel methodology for repairing pretrained neural networks to satisfy global robustness and individual fairness properties. A neural network is said to be globally robust with respect to a given input region if and only if all the input points in the region are locally robust. This notion of global robustness also captures the notion of individual fairness as a special case. We prove that any counterexample to a global robustness property must exhibit a corresponding large gradient. For ReLU networks, this result allows us to efficiently identify the linear regions that violate a given global robustness property. By formulating and solving a suitable robust convex optimization problem, REGLO then computes a minimal weight change that will provably repair these violating linear regions.
more » « less
Full Text Available
POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems

https://doi.org/10.1109/TCAD.2023.3331215

Wang, Yixuan; Zhou, Weichao; Fan, Jiameng; Wang, Zhilu; Li, Jiajun; Chen, Xin; Huang, Chao; Li, Wenchao; Zhu, Qi (March 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Verification and Design of Robust and Safe Neural Network-enabled Autonomous Systems

https://doi.org/10.1109/Allerton58177.2023.10313451

Zhu, Qi; Li, Wenchao; Huang, Chao; Chen, Xin; Zhou, Weichao; Wang, Yixuan; Li, Jiajun; Fu, Feisi (September 2023, IEEE)
Runtime-Safety-Guided Policy Repair

https://doi.org/10.1007/978-3-030-60508-7_7

Zhou, Weichao; Gao, Ruihan; Kim, BaekGyu; Kang, Eunsuk; Li, Wenchao (October 2020, International Conference on Runtime Verification (RV))
null (Ed.)
Full Text Available
Safety-Aware Apprenticeship Learning

Zhou, Weichao; Li, Wenchao (January 2018, Computer-Aided Verification)

Full Text Available

Search for: All records